Back

Data Semantics Project

Studying the difference between philosophical schools with word embeddings.

This Project was developed for the Data Semantics course in the Master Degree in Data Science.

“The meaning of a word is its use in the language” (Wittgenstein, 1953)

This phrase pronounced by philosopher Ludwig Wittgenstein outlines in a nutshell the main idea behind the word embeddings algorithms, and thus serves as a perfect symbolic bridge between philosophy and data semantics.
Philosophy is a purely human subject: every individual in human history has had to wrestle with the questions that existence poses to us. This means that the texts of this subject are many, extensive, and inevitably fragmented into different schools of thought, different ways of perceiving what we know. Keeping track of them is difficult, with so many thoughts and so many points of view.
But all these different answers always start from the same questions: those of humanity. So why not use this algorithms of word embeddings to search, in the thoughts of the various philosophical currents, for the main differences in the answers of the same questions?

An initial archive of texts that could be used to answer these questions could have been obtained from The Philosophy Data Project. Indeed, this project demonstrates similar interests to mine and makes its data available on the Kaggle platform. An initial analysis of the corpus, however, showed an insufficient text size for training word embedding algorithms. Therefore, the file was personally extensively enriched with all the philosophers' books available on Project Gutenberg. The resulting division in schools of thought is:

  1. Nihilism: Nietzsche, Kierkegaard
  2. Empiricism: Berkeley, Hume, Locke
  3. Rationalism: Descartes, Leibniz, Malebranche, Spinoza
  4. German Idealism: Fichte, Hegel, Kant
  5. Analytics: Kripke, Lewis, Moore, Popper, Quine, Russel, Wittgenstein
  6. Aristotle: Aristotle
  7. Plato: Plato
An eight slice has been implemented from Wikipedia’s abstract. The usefulness of this additional slice is: more data to train the compass and the possibility to draw comparisons with a "neutral" slice.

The analysis was done with three algorithms: Word2Vec, CADE and SWEAT. Other 4 simple tools (functions) were then implemented by me to better carry an exploratory analysis with 8 slices.

The result of the work were very interesting as the word embeddings demonstrated to have captured semantic differences in the meaning of some words, accordingly to what we would expect from the philosophical schools. Analytics' view of logic as the fundation for understanding the universe is for example reflected both in the embeddings of logic and mathematics. The same can be said for language, that is linked with "usage" for analytics, with "composition" for idealist and with "persuasion" for Aristotle, accordingly to the three schools' view on the matter. Also the similarity between the embeddings of "idea" and "reality" for idealist is evident. For plato, the words "idea" and "innate" and the words "evil" and "ignorance" are significantly more similar than in the other slices, confirming Plato's thoughts. The same can be said for empiricism, in which the concept of "blank slate" is reflected in the vicinity of the embeddings of the words "mind" and "empty". An analysis of the polarization of the embeddings shows how Nihilism philosophy's darker view of existence is reflected in the embeddings trained from that slice. Lastly, a confrontation between the meanings of "knowledge" in the empiricist and rationalist schools reflects the views of the two currents: the word is linked with "experience" in the first school and with "intelligence" in the second.
Finally, a conclusion section highlights the concepts that have not been captured by the word embeddings, and tries to understand what could have been done to achieve better results.

Tags

Data Semantics Word Embeddings Word2Vec cade WEAT NLP